- Created: by Isaac Sebenius, inspired by work of Eduarda Centeno & Fernando Santos (see Acknowledgements), with small modifications by Rik Henson
- Date: November 2024
- conda environment: This uses the mri environment
Network (Graph) Analysis¶
This notebook explains some concepts in network analysis (graph theory) using functional and structural connectomes.
This notebook uses data that you need to download the contents of the networks directory from https://cloud.mrc-cbu.cam.ac.uk/index.php/s/3AFA0yF7QK9qIaO and put into your local networks directory created below. The password for this cloud directory will be given by the course organiser.
0. Getting Ready¶
Usual python libraries, with new ones being networkx and nxviz
# Basic data manipulation and visualisation libraries
import os
import numpy
import matplotlib.pyplot as plt
# to show plots in cell
%matplotlib inline
import seaborn
import pandas
from scipy import stats
# For some graphics below
from IPython.display import Image
from IPython.core.display import HTML
# Stop warnings about new versions of packages below
import warnings
warnings.filterwarnings('ignore')
# For transferring python objects as byte streams across network
import pickle
# Network Libraries
import networkx
from nxviz import CircosPlot, circos
import community
And set-up our input and output directory:
wd = '/mnt/c/Users/rh01/PythonNeuroimagingCourse/FaceRecognition/' # <-- CHANGE TO YOURS
out_dir = os.path.join(wd, 'networks')
if not os.path.exists(out_dir):
os.makedirs(out_dir)
os.chdir(out_dir)
print(f"Working directory currently {os.getcwd()}")
Working directory currently /mnt/c/Users/rh01/PythonNeuroimagingCourse/FaceRecognition/networks
1. Importing data & exploring connectivity matrices¶
These are three connectomes: 1) from fMRI, 2) from structural similarity (called Morphometric INverse Divergence or MIND) and 3) from diffusion tensor imaging (DTI), from each of 100 subjects.
Our first step will be to import the data:
#Load individual data for 100 subjects from the HCP dataset
fmri_connectivity = pickle.load(open('FCs.HCP.pkl','rb'))
mind_connectivity = pickle.load(open('MIND.HCP.pkl','rb'))
dti_connectivity = pickle.load(open('DTI.HCP.pkl','rb'))
print(f"{fmri_connectivity.shape[0]} subjects with {fmri_connectivity.shape[1]} x {fmri_connectivity.shape[2]} connectomes")
# Create a mask for connectome
mask = numpy.ones(fmri_connectivity[0].shape)
mask[numpy.diag_indices(360)] = numpy.nan
# Compute group-averaged networks
mean_fmri_connectivity = numpy.mean(fmri_connectivity, axis = 0)*mask
mean_mind_connectivity = numpy.mean(mind_connectivity, axis = 0)*mask
mean_dti_connectivity = numpy.mean(dti_connectivity, axis = 0)*mask
100 subjects with 360 x 360 connectomes
We also need to know the names and coordinates of the 360 ROIs (and will assign them to random networks just for ease):
# Load region information
region_information = pandas.read_csv('HCP-coordinates.csv')
# Obtaining name of areas according to matching file
lineList = list(region_information['regionName'].values.flatten())
# Obtaining a random list of numbers to simulate subnetworks -- THESE NUMBERS DO NOT CORRESPOND TO ANY REAL CLASSIFICATION
sublist = numpy.array(numpy.loadtxt('HCP_Yeo_symmetric.txt'))
# Obtaining a random list of colors that will match the random subnetwork classification for further graphs
cmap_dict = dict(zip(list(range(8)), seaborn.color_palette("tab10", n_colors = 8)))
# Obtaining a random list of colors (in numbers) that will match the random subnetwork classification for further graphs
colorlist = numpy.array([cmap_dict[x] for x in sublist])
colornumbs = numpy.array(colorlist.copy())#numpy.genfromtxt('./subnet_colors_number.txt')
Let's look at what these group-averaged networks look like, using a standard heatmap
fig, ax = plt.subplots(1, 3, figsize = (15, 5))
seaborn.heatmap(mean_fmri_connectivity, ax = ax[0], square = True, cbar_kws = {"shrink": 0.75})
seaborn.heatmap(mean_mind_connectivity, ax = ax[1], square = True, cbar_kws = {"shrink": 0.75})
seaborn.heatmap(mean_dti_connectivity, ax = ax[2], square = True, cbar_kws = {"shrink": 0.75})
ax[0].set_title('Mean fMRI Connectivity')
ax[1].set_title('Mean DTI Connectivity')
ax[2].set_title('Mean MIND Connectivity')
plt.show()
The fMRI and DTI connectomes show high connectivity between homologous ROIs in left and right hemispheres, but the MIND does not (showing higher similarity within each hemisphere).
We can also look at the similarity (correlation) between pairs of these modalities. Note that we are only correlating the upper right triangle of the matrix, since the matrices are symmetrical and the diagonal is meaningless here.
print('DTI - fMRI connectivity edgewise correlation:', str(round(stats.pearsonr(mean_fmri_connectivity[numpy.triu_indices(360, k = 1)], mean_dti_connectivity[numpy.triu_indices(360, k = 1)])[0], 2)))
print('MIND - fMRI edgewise correlation:', str(round(stats.pearsonr(mean_fmri_connectivity[numpy.triu_indices(360, k = 1)], mean_mind_connectivity[numpy.triu_indices(360, k = 1)])[0], 2)))
print('DTI - MIND edgewise correlation:', str(round(stats.pearsonr(mean_mind_connectivity[numpy.triu_indices(360, k = 1)], mean_dti_connectivity[numpy.triu_indices(360, k = 1)])[0], 2)))
DTI - fMRI connectivity edgewise correlation: 0.17 MIND - fMRI edgewise correlation: 0.17 DTI - MIND edgewise correlation: 0.11
which shows that DTI and MIND are least similar.
2. Deciding on edges¶
When performing network analysis on brain data, some crucial decisions have to be made.
One is whether to use all network connections - including low-weight edges (sometimes considered spurious connections), or establish an arbitrary threshold and keep only edges above a specific correlation value. This step can be done in different ways, based solely on the correlation threshold (as done here), or based on network density (eg, by keeping only the 20% strongest correlations).
A seond decision is whether the edges will be weighted (keep their continuous connectivity values), or unweighted (binarised into present or absent only).
Another decision is how to deal with negative weights in weighted networks (ie, are they meaningful?).
These issues are covered in Bassett & Sporns (2017)
Later, we will threshold the top weighted connections, ignoring negative/low connections.
Figure 1 provides a schematic summary of the types of network edges (note you will need internet connection to download image):

Figure 1. Types of networks. (A) A binary directed graph. (B) Binary, undirected graph. In binary graphs, the presence of a connection is signified by a 1 or 0 otherwise. (C) A representation of graph F as a network of brain areas. (D) A weighted, directed graph. (F) A weighted, undirected graph. In a weighted graph, the strength of the connections is represented by a number [0,1]. (G) A connectivity matrix of C and F. Source: Part of the image was obtained from Smart Servier Medical Art.
To determine what type of edge is best, it can be useful to generate some plots (e.g., the heatmaps above for matrix visualisation, and distribution plots of edge weights) to facilitate data comprehension and flag potential artefacts. In brain networks, we expect mostly weak edges and a smaller proportion of strong ones. Let's see:
# Weight distribution plot
def MinMax(X):
X_scaled = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
print('Original Min:', X.min(axis=0), 'Max:', X.max(axis=0))
return X_scaled
print('fMRI'); centiled_fmri_edges = MinMax(mean_fmri_connectivity[numpy.triu_indices(360, k =1)])
print('DTI'); centiled_dti_edges = MinMax(mean_dti_connectivity[numpy.triu_indices(360, k =1)])
print('MIND'); centiled_mind_edges = MinMax(mean_mind_connectivity[numpy.triu_indices(360, k =1)])
fig, ax = plt.subplots(1, 3, figsize = (15, 5))
seaborn.histplot(centiled_fmri_edges, ax = ax[0])
seaborn.histplot(centiled_dti_edges, ax = ax[2])
seaborn.histplot(centiled_mind_edges, ax = ax[1])
ax[0].set_xlabel('Scaled functional connectivity')
ax[1].set_xlabel('Scaled DTI connectivity')
ax[2].set_xlabel('Scaled MIND connectivity')
plt.show();
fMRI Original Min: -0.22500947415000014 Max: 0.8653164699999999 DTI Original Min: 0.0 Max: 0.7328988559347287 MIND Original Min: 0.02737757078865628 Max: 0.5063973059739767
You can see that the distributions tend to be positively skewed (particularly fMRI), i.e, a small number of larger weights than expected from a Gaussian. The DTI distribution immediately illustrates zero-inflation, ie., a number of weights of 0, which may be important (or an artefact). The MIND distribution is bimodal, with a lot of weak connections. Depending on your knowledge of what the connection weights mean, you could for example choose (centiled) thresholds from the above distribution to remove zero or weaker connections.
2. Graph-theoretic summary measures¶
The metrics that we will cover here are:
- Density
- Degree/Strength
- Centrality: Eigenvector, Betweenness, Closeness, Degree, Page Rank
- Path length
- Modularity
- Assortativity
- Clustering coefficient
Each of these metrics has its requisites for computation. For example, it is not possible to accurately compute closeness centrality and the average shortest path for fragmented networks (i.e., there are subsets of disconnected nodes). Therefore, keep that in mind when thinking about thresholding the connectivity matrix.
Figure 2 illustrates some graph-theoretical metrics:
Figure 2. Graph theoretical metrics. (A) A representation of a graph indicating centralities. (B) Representation of modularity and clustering coefficient. (C) The shortest path between vertices A and B. (D) The minimum spanning tree.
2.1 Creating graph¶
Let's create an undirected graph from the group-mean fMRI connectivity matrix, using the centiled weights above and the networkz library:
centiled_fc = numpy.zeros(mean_fmri_connectivity.shape)
# Set upper triangle
centiled_fc[numpy.triu_indices(360, k = 1)] = centiled_fmri_edges
# Mirror to make symmetrical
centiled_fc += centiled_fc.T
matrix = centiled_fc.copy()
G = networkx.from_numpy_array(matrix)
# Removing self-loops
G.remove_edges_from(list(networkx.selfloop_edges(G)))
2.2 Network density¶
Definition: A graph's density is the ratio between the number of edges and the total number of possible edges. You can get more information by uncommenting the document print statement in the following cells.
Clearly, in all-to-all connected graphs, the density will be maximal (or 1), whereas for a graph without edges it will be 0. Here, just for the sake of demonstration, we will compute different thresholds to show how density changes.
#print(networkx.density.__doc__)
# Create graphs for comparison
matrix2 = matrix.copy()
matrix3 = matrix.copy()
# Create sparser graphs
matrix2[matrix2<=0.50] = 0
matrix3[matrix3<=0.75] = 0
alltoallG = G
st50G = networkx.from_numpy_array(matrix2)
st25G = networkx.from_numpy_array(matrix3)
st50G.remove_edges_from(list(networkx.selfloop_edges(st50G)))
st25G.remove_edges_from(list(networkx.selfloop_edges(st25G)))
# Compute densities
alltoall = networkx.density(G)
st50 = networkx.density(st50G)
st25 = networkx.density(st25G)
names = ['All-To-All', '> 0.5', '> 0.75']
values = [alltoall, st50, st25]
dict(zip(names, values))
{'All-To-All': 0.999984524914887,
'> 0.5': 0.11415970287836583,
'> 0.75': 0.004085422469823584}
2.3 Nodal degree/strength¶
Definition: In undirected weighted networks the node strength can be computed as the sum of the connectivity weights of the edges attached to each node. It is a primary metric to identify how important is a node in the graph. It is possible to apply a normalization (divide the weights by 1/N-1) to make the output value more intuitive.
In degree computation, it is also common to compute the mean degree of the network, which is the sum of node degrees divides by the total number of nodes. We will use the full network, but you can see how this changes by using the sparser networks created above instead:
#print(networkx.degree.__doc__)
G = alltoallG
#G=st50G # if you want to try sparser network instead
strength = G.degree(weight='weight')
strengths = {node: val for (node, val) in strength}
networkx.set_node_attributes(G, dict(strength), 'strength') # Add as nodal attribute
# Normalized node strength values 1/N-1
normstrenghts = {node: val * 1/(len(G.nodes)-1) for (node, val) in strength}
networkx.set_node_attributes(G, normstrenghts, 'strengthnorm') # Add as nodal attribute
# Computing the mean degree of the network
normstrengthlist = numpy.array([val * 1/(len(G.nodes)-1) for (node, val) in strength])
mean_degree = numpy.sum(normstrengthlist)/len(G.nodes)
print(f"Mean degree is {mean_degree}")
seaborn.distplot(list(normstrengthlist), kde=True, norm_hist=False)
plt.xlabel('Normalised Strength Values'); plt.ylabel('Counts');
Mean degree is 0.36022342222188036
It is difficult to directly interpret the normalised strength values, but if we have binarised our graph instead, it would correspond to the mean degree, i.e. average number of other nodes connected to each node.
2.4 Centralities¶
Centralities are frequently used to understand which nodes occupy critical positions in the network.
Definition: The "degree centrality" for a node v is the fraction of nodes it is connected to. This metric is the same as node degree, so it will not be computed again.
Definition: In weighted graphs, the "closeness centrality" of a node v is the reciprocal of the sum of the shortest weighted path distances from v to all N-1 other nodes. An important thing to think about this metric is that a node with many low weight edges can have the same centrality as a node with only a few high-weighted edges.
Definition: "Betweenness centrality" of a node v is the sum of the fraction of all-pairs shortest paths that pass through v.
Definition: "Eigenvector Centrality": Eigenvector centrality computes the centrality for a node based on its neighbours' centrality. It takes into account not only quantity (e.g., degree centrality) but also quality. If a node is linked to many nodes that also display a high degree, that node will have high eigenvector centrality.
Definition: "PageRank" computes a ranking of the nodes in the graph G based on the incoming links' structure. (This is the basis of some web search engines.)
Again, you can get more information by uncommenting the document print statements in the following cells.
We'll just plot closeness and betweeness centrality for moment:
#print(networkx.closeness_centrality.__doc__)
# The function accepts an argument 'distance' that, in correlation-based networks, must be seen as the inverse ...
# of the weight value. Thus, a high correlation value (e.g., 0.8) means a shorter distance (i.e., 0.2).
G_distance_dict = {(e1, e2): 1 / abs(weight) for e1, e2, weight in G.edges(data='weight')}
# Then add them as attribute to the graph edges
networkx.set_edge_attributes(G, G_distance_dict, 'distance')
# Computation of Closeness Centrality
closeness = networkx.closeness_centrality(G, distance='distance')
# Now we add the closeness centrality value as an attribute to the nodes
networkx.set_node_attributes(G, closeness, 'closecent')
# Closeness Centrality Histogram
seaborn.distplot(list(closeness.values()), kde=True, norm_hist=False);
plt.xlabel('Centrality Values'); plt.ylabel('Counts'); plt.title('Closeness Centrality');
#print(networkx.betweenness_centrality.__doc__)
betweenness = networkx.betweenness_centrality(G, weight='distance', normalized=True)
# Now we add the it as an attribute to the nodes
#networkx.set_node_attributes(G, betweenness, 'bc')
# Betweenness centrality Histogram
seaborn.distplot(list(betweenness.values()), kde=False, norm_hist=False)
plt.xlabel('Centrality Values'); plt.ylabel('Counts'); plt.title('Betweenness Centrality');
Only a few nodes have high betweeness centrality, and these nodes are likely to have a big influence on rest of network (e.g, in terms of passing information from functional connectivity).
You can plot eigenvector centrality and pagerank if you uncommment below, but there is not a lot we can take from these with comparison with other networks (e.g, you could compare these with those from the DTI or MIND networks...)
#print(networkx.eigenvector_centrality.__doc__)
eigen = networkx.eigenvector_centrality(G, weight='weight')
# Now we add the it as an attribute to the nodes
networkx.set_node_attributes(G, eigen, 'eigen')
# Eigenvector centrality Histogram
#with warnings.catch_warnings():
# warnings.filterwarnings("ignore", category=UserWarning) # suppress warning about old version of distplot
# seaborn.distplot(list(eigen.values()), kde=False, norm_hist=False);
#plt.xlabel('Centrality Values'); plt.ylabel('Counts'); plt.title('Eigenvector Centrality');
#print(networkx.pagerank.__doc__)
pagerank = networkx.pagerank(G, weight='weight')
# Add as attribute to nodes
networkx.set_node_attributes(G, pagerank, 'pg')
# Page Rank Histogram
#seaborn.distplot(list(pagerank.values()), kde=False, norm_hist=False);
#plt.xlabel('Pagerank Values'); plt.ylabel('Counts'); plt.title('Pagerank');
If you like, you can plot all the above metrics against each other (how redundant are they?) or repeat them with different sparsity levels.
2.5 Shortest path¶
Definition: The "shortest path" is the smallest distance between two nodes in a graph. In a weighted graph it is obtained by the minimum sum of weights.
Definition: The "average Path Length" is the average number of steps along the shortest paths for all possible pairs of network nodes. It is a measure of the efficiency of information or mass transport on a network.#
There are several different algorithms for calculating these metrics efficiently - below we use the "Dijkstra" method, but you can try others by uncommenting (and timing with time if you like)
#print(networkx.shortest_path_length.__doc__)
# This is a versatile version of the ones below in which one can define or not source and target. Remove the hashtag comment to use this version.
#shortest_paths = networkx.shortest_path_length(G, weight='distance')
# This one can also be used for a specific source and target:
#print(networkx.dijkstra_path_length.__doc__)
networkx.dijkstra_path_length(G, source=20, target=25, weight='distance')
# Whereas this one is for all pairs. Remove the hashtag comment to use this version.
#print(networkx.all_pairs_dijkstra_path_length.__doc__)
#path_lengths = networkx.all_pairs_dijkstra_path_length(G, weight='distance')
#print(networkx.average_shortest_path_length.__doc__)
print(networkx.average_shortest_path_length(G, weight='distance'))
2.98380660767069
Thus the average shortest path between two nodes in the fMRI network is around 3 nodes.
2.6 Module/community-level metrics¶
Definition: Modularity compares the number of edges inside a cluster with the expected number of edges that one would find if the network was connected randomly but with the same number of nodes and node degrees. It is used to identify strongly connected subsets, i.e., modules or 'communities'. Again, there are several algorithms for defining; we will use the Louvain algorithm.
Definition: Assortativity measures the similarity of connections in the graph with respect to the node degree.
Definition: The Efficiency of a pair of nodes in a graph is the multiplicative inverse of the shortest path distance between the nodes. More efficient means a shorter average path between nodes.
Definition: Clustering Coefficient is a measure of the tendency for any two neighbours of a node to be directly connected. According to Networkx's documentation, weighted graphs' clustering coefficient is defined as the geometric average of the subgraph edge weights.
Definition: A Small World Network is characterized by a small average shortest path length, and a large clustering coefficient. Small-worldness is commonly measured with the coefficient sigma or omega. Both coefficients compare the average clustering coefficient and shortest path length of a given graph against the same quantities for an equivalent random or lattice graph.
Definition: The Minimum Spanning Tree is the backbone of a network, i.e. the minimum set of edges necessary to ensure that paths exist between all nodes. A few main algorithms are used to build the spanning tree, being the Kruskal's algorithm the one used by NetworkX. Briefly, this algorithm ranks the distance of the edges, adds the ones with the smallest distance first, and by adding edge-by-edge, it checks if cycles are formed or not. The algorithm will not add an edge that results in the formation of a cycle.
We will start by defining the number of communities in the fMRI network. For these we will use the 50% sparse network (otherwise measures like efficiency will be close to maximum of 1).
#print(community.best_partition.__doc__)
part = community.best_partition(st50G, weight='weight')
coms = set(part.values()).union()
print(f"{len(coms)} communities found: {coms}")
# List each community associated with each node
print(part)
47 communities found: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46}
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 1, 7: 1, 8: 5, 9: 5, 10: 4, 11: 0, 12: 4, 13: 4, 14: 0, 15: 0, 16: 0, 17: 0, 18: 0, 19: 0, 20: 0, 21: 0, 22: 1, 23: 4, 24: 4, 25: 5, 26: 1, 27: 4, 28: 4, 29: 4, 30: 4, 31: 4, 32: 4, 33: 4, 34: 1, 35: 1, 36: 5, 37: 1, 38: 1, 39: 1, 40: 1, 41: 1, 42: 5, 43: 5, 44: 5, 45: 1, 46: 0, 47: 0, 48: 5, 49: 1, 50: 1, 51: 1, 52: 1, 53: 1, 54: 1, 55: 6, 56: 7, 57: 5, 58: 5, 59: 4, 60: 4, 61: 4, 62: 4, 63: 4, 64: 8, 65: 4, 66: 4, 67: 4, 68: 4, 69: 4, 70: 4, 71: 4, 72: 4, 73: 4, 74: 4, 75: 4, 76: 5, 77: 4, 78: 4, 79: 4, 80: 5, 81: 4, 82: 5, 83: 4, 84: 5, 85: 4, 86: 4, 87: 4, 88: 9, 89: 5, 90: 10, 91: 11, 92: 4, 93: 5, 94: 5, 95: 4, 96: 4, 97: 1, 98: 1, 99: 1, 100: 1, 101: 1, 102: 1, 103: 1, 104: 5, 105: 1, 106: 5, 107: 5, 108: 12, 109: 5, 110: 13, 111: 5, 112: 14, 113: 1, 114: 5, 115: 5, 116: 15, 117: 16, 118: 17, 119: 0, 120: 18, 121: 4, 122: 1, 123: 1, 124: 19, 125: 5, 126: 4, 127: 4, 128: 4, 129: 4, 130: 4, 131: 4, 132: 4, 133: 20, 134: 5, 135: 5, 136: 0, 137: 1, 138: 1, 139: 0, 140: 0, 141: 5, 142: 4, 143: 4, 144: 5, 145: 5, 146: 5, 147: 4, 148: 4, 149: 4, 150: 0, 151: 0, 152: 0, 153: 5, 154: 0, 155: 0, 156: 0, 157: 0, 158: 0, 159: 4, 160: 4, 161: 0, 162: 21, 163: 22, 164: 23, 165: 5, 166: 1, 167: 5, 168: 4, 169: 4, 170: 24, 171: 1, 172: 1, 173: 1, 174: 4, 175: 4, 176: 25, 177: 5, 178: 5, 179: 4, 180: 0, 181: 0, 182: 0, 183: 0, 184: 0, 185: 0, 186: 1, 187: 1, 188: 0, 189: 5, 190: 4, 191: 0, 192: 4, 193: 4, 194: 0, 195: 0, 196: 0, 197: 0, 198: 0, 199: 0, 200: 0, 201: 0, 202: 1, 203: 5, 204: 4, 205: 5, 206: 1, 207: 5, 208: 4, 209: 4, 210: 4, 211: 4, 212: 4, 213: 4, 214: 1, 215: 1, 216: 5, 217: 1, 218: 1, 219: 1, 220: 1, 221: 5, 222: 5, 223: 5, 224: 5, 225: 1, 226: 0, 227: 0, 228: 5, 229: 1, 230: 1, 231: 1, 232: 1, 233: 1, 234: 1, 235: 5, 236: 26, 237: 5, 238: 5, 239: 27, 240: 4, 241: 4, 242: 28, 243: 4, 244: 29, 245: 4, 246: 4, 247: 4, 248: 4, 249: 4, 250: 4, 251: 4, 252: 4, 253: 4, 254: 4, 255: 4, 256: 5, 257: 4, 258: 5, 259: 4, 260: 5, 261: 4, 262: 5, 263: 4, 264: 5, 265: 4, 266: 4, 267: 4, 268: 30, 269: 4, 270: 31, 271: 32, 272: 4, 273: 5, 274: 5, 275: 4, 276: 4, 277: 1, 278: 1, 279: 1, 280: 1, 281: 1, 282: 1, 283: 1, 284: 5, 285: 1, 286: 5, 287: 5, 288: 33, 289: 4, 290: 34, 291: 5, 292: 35, 293: 1, 294: 5, 295: 5, 296: 36, 297: 37, 298: 38, 299: 0, 300: 39, 301: 40, 302: 1, 303: 1, 304: 41, 305: 5, 306: 4, 307: 4, 308: 4, 309: 4, 310: 4, 311: 4, 312: 4, 313: 42, 314: 5, 315: 5, 316: 0, 317: 1, 318: 0, 319: 0, 320: 0, 321: 5, 322: 4, 323: 4, 324: 0, 325: 5, 326: 5, 327: 4, 328: 4, 329: 4, 330: 0, 331: 0, 332: 0, 333: 5, 334: 0, 335: 0, 336: 0, 337: 0, 338: 0, 339: 4, 340: 4, 341: 0, 342: 43, 343: 44, 344: 45, 345: 5, 346: 1, 347: 5, 348: 4, 349: 4, 350: 46, 351: 1, 352: 1, 353: 1, 354: 4, 355: 4, 356: 3, 357: 5, 358: 2, 359: 0}
We can estimate some properties of these communities too:
print(f"Global Efficiency = {networkx.global_efficiency(st50G)}")
clustering = networkx.clustering(st50G, weight='weight')
clustering = numpy.array(list(clustering.values()))
print(f"Mean Clustering Coefficient = {numpy.mean(clustering)}")
#Takes a very long time if you want to uncommment!
#small_worldness = networkx.sigma(st50G, niter=100, nrand=10, seed=None)
#print(f"Small-worldness = {small_worldness}")
Global Efficiency = 0.3798700461306392 Mean Clustering Coefficient = 0.33174925463501714
3 Network visualisation¶
Under this section we we'll provide a few ideas of how to visualise and present your network.
First, let's get some important attributes about brain area names and subnetworks:
# Function to transform our list of brain areas into a dictionary
def Convert(lst):
res_dct = {i : lst[i] for i in range(0, len(lst))}
return res_dct
# Add brain areas as attribute of nodes
networkx.set_node_attributes(G, Convert(lineList), 'area')
# Add node colors
networkx.set_node_attributes(G, Convert(colorlist), 'color')
# Add subnetwork attribute
networkx.set_node_attributes(G, Convert(sublist), 'subnet')
# Add node color numbers
networkx.set_node_attributes(G, Convert(colornumbs), 'colornumb')
3.1 Spring plot¶
Now we will create a standard spring network plot, but this could also be made circular by changing to draw_circular. We'll return to the fully-connected G network (since defining the minimumum spanning tree later can fail if network not dense enough).
We defined the edge widths to the power of 2 so that weak weights will have smaller widths.
# Standard Network graph with nodes in proportion to Graph degrees
plt.figure(figsize=(30,30))
edgewidth = [ d['weight'] for (u,v,d) in G.edges(data=True)]
pos = networkx.spring_layout(G, scale=5)
networkx.draw(G, pos, with_labels=True, width=numpy.power(edgewidth, 2), edge_color='grey', node_size=normstrengthlist*20000,
labels=Convert(lineList), font_color='black', node_color=colornumbs/10, cmap=plt.cm.Spectral, alpha=0.7, font_size=9)
#plt.savefig('network.jpeg')
Obviously that is WAY too much information. Let's visualise the Minimum Spanning Tree instead.
3.2 Minimum Spanning Tree¶
GMST = networkx.minimum_spanning_tree(G, weight='distance')
plt.figure(figsize=(15,15))
networkx.draw(GMST, with_labels=True, alpha=0.7, labels=Convert(lineList), font_size=9)
#networkx.draw(GMST, with_labels=True, width=numpy.power(edgewidth, 0.5), edge_color='grey', node_size=normstrengthlist*200,
# labels=Convert(lineList), font_color='black', node_color=colornumbs/10, cmap=plt.cm.Spectral, alpha=0.7, font_size=9)
3.3 Circular plots¶
For the sake of a less overwhelming plot, we will work with the minimum spanning tree for the CircosPlot.
networkx.set_node_attributes(GMST, dict(GMST.degree(weight='weight')), 'strength')
networkx.set_node_attributes(GMST, Convert(lineList), 'area')
networkx.set_node_attributes(GMST, Convert(colorlist), 'color')
networkx.set_node_attributes(GMST, Convert(sublist), 'subnet')
G_distance_dict2 = {(e1, e2): 1 / abs(weight) for e1, e2, weight in GMST.edges(data='weight')}
# Then add them as attributes to the graph
networkx.set_edge_attributes(GMST, G_distance_dict2, 'distance')
GMST_GRL = networkx.relabel_nodes(GMST, {i: lineList[i] for i in range(len(lineList))})
# CircosPlot
fig, ax = plt.subplots(1,1,figsize = (30,30));
circ = circos(GMST_GRL, edge_alpha_by="weight", node_color_by="subnet") #, node_labels=True, node_label_layout='rotation', node_order='subnet',
# edge_color='weight', edge_width='weight', node_color='subnet', node_label_color=True, fontsize=10,
# nodeprops={"radius": 2}, group_legend=True, group_label_offset=5)
plt.show();
That is all for this notebook - have fun comparing the fMRI, DTI and MIND networks on the measures above, or relating these measures to properties of the subjects like age/sex (rather than looking at the average only). These properties can be found in the metadata CSV file:
metadata = pandas.read_csv('metadata.csv')
print(metadata.head())
participant age_days sex 0 sub-100307 10135 Female 1 sub-100408 12325 Male 2 sub-101107 8310 Male 3 sub-101309 10865 Male 4 sub-101915 13055 Female
4. Acknowledgements¶
This tutorial is adapted from "Centeno, E.G.Z., Moreni, G., Vriend, C. et al. A hands-on tutorial on network and topological neuroscience. Brain Struct Funct 227, 741–762 (2022)." https://github.com/multinetlab-amsterdam/network_TDA_tutorial/tree/main.